skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Search for: All records

Creators/Authors contains: "Fabish, J"

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

  1. Predictive modeling of a rare event using an unbalanced data set leads to poor prediction sensitivity. Although this obstacle is often accompanied by other analytical issues such as a large number of predictors and multicollinearity, little has been done to address these issues simultaneously. The objective of this study is to compare several predictive modeling techniques in this setting. The unbalanced data set is addressed using four resampling methods: undersampling, oversampling, hybrid sampling, and ROSE synthetic data generation. The large number of predictors is addressed using penalized regression methods and ensemble methods. The predictive models are evaluated in terms of sensitivity and F1 score via simulation studies and applied to the prediction of food deserts in North Carolina. Our results show that balancing the data via resampling methods leads to an improved prediction sensitivity for every classifier. The application analysis shows that resampling also leads to an increase in F1 score for every classifier while the simulated data showed that the F1 score tended to decrease slightly in most cases. Our findings may help improve classification performance for unbalanced rare event data in many other applications. 
    more » « less